A Dataset for Joint Noun-Noun Compound Bracketing and Interpretation
نویسنده
چکیده
We present a new, sizeable dataset of noun– noun compounds with their syntactic analysis (bracketing) and semantic relations. Derived from several established linguistic resources, such as the Penn Treebank, our dataset enables experimenting with new approaches towards a holistic analysis of noun–noun compounds, such as jointlearning of noun–noun compounds bracketing and interpretation, as well as integrating compound analysis with other tasks such as syntactic parsing.
منابع مشابه
Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing
In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to £rst determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics from Web search engines using a χ measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89....
متن کاملA Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
The automatic interpretation of noun-noun compounds is an important subproblem within many natural language processing applications and is an area of increasing interest. The problem is difficult, with disagreement regarding the number and nature of the relations, low inter-annotator agreement, and limited annotated data. In this paper, we present a novel taxonomy of relations that integrates p...
متن کاملMultiword noun compound bracketing using Wikipedia
This research suggests two contributions in relation to the multiword noun compound bracketing problem: first, demonstrate the usefulness of Wikipedia for the task, and second, present a novel bracketing method relying on a word association model. The intent of the association model is to represent combined evidence about the possibly lexical, relational or coordinate nature of links between al...
متن کاملLinked Open Data and Web Corpus Data for noun compound bracketing
This research provides a comparison of a linked open data resource (DBpedia) and web corpus data resources (Google Web Ngrams and Google Books Ngrams) for noun compound bracketing. Large corpus statistical analysis has often been used for noun compound bracketing, and our goal is to introduce a linked open data (LOD) resource for such task. We show its particularities and its performance on the...
متن کاملScaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing
We describe the use of the Layered Query Language and architecture to acquire statistics for natural language processing applications. We illustrate system’s use on the problem of noun compound bracketing using MEDLINE.
متن کامل